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A COMPARISON OF COMPUTERIZED TECHNIQUES POR 
RECOGNIZING SPANISH NAMES 



INTRODUCTION 

Thb rci;earch miikpii ujsc or Air Forei* survey data to idiow the rolaiion4ii|)j« iH^iwi'on 
those iiersons who classified themselves as Sfiatiish and tliose t>ersons whosi* names would 
be treated as Spanish by various rompulerized coding terhnif|ues. For e^ich codini; 
technique an estimate is calculated Tor: 

(U The proportion of iiorsons with Sponisli names who did not cta^sify 

themselves as Sponisii. 
(2) The proportion ot persons who classified them^lves as Sponifih who dikl 
not have Spanish names. 
These estimates ore calculated Tor the United States as a whole and Tor several brood 
groupings by geographic area, age, educational attainment, and {lercentile on the Armed 
Forces Qualirication Test (AFQT). 



BACKGROUND AND OBJECTIVES OF THE STUDY 

Since the Census Bureau *s use of Spanish surnam(?<; in the 1950 Census, Spanish 
names have Iwen increasingly used to identiry |)ersons oi Spanish culture in the United 
States. The 1960 and 1970 Censuses again mode use af Spanish surnames, as did the 
Census re|)ort of Minority^Owncd fius/nJxses: 1969 (1). An unknown hut growing number 
or research studies (e.g., 2:&) bave utilized Spanish surnames as a means of classifying 
data by ethnic group. In addition. Title 7 of tlie Civil Rights Act of 1964 requires that 
employers or 25 or more persons re|)ort the number of employees with Spanish surnames 
in each |K>.sition held in the company. In some cases (7) the Spanish minorities in the 
United States are now referred to as '*Spanish*surnamed** individuals, as if the name 
rather than the cultures were the im|x>rtant group-defining characteristic. 

Despite this widespread use of the Spunisit surname as a surrogate variable, Uiere are 
no recent studies which attempt to show the degree of corres|K>ndence between Sjianish* 
surnamed individuals and individuals who belong to a Spanish cultural group. The present 
study attempts to remedy this deficiency within certain limitations imposed by the data. 



PLAN OF THE STUDY 

The data are taken from the U.S. Air Force Airman Sample Survey of March 1971, 
and the Air Force Master File of male enlisted personnel as of 30 June 1971. 

An airman survey is performed triannually to answer a wide variety of questions of 
interest to the Air Force. The March 1971 survey consisted of a mark sense scanner form 
which a 5% sample of airmen were asked to complete during duty hours. The questionnaire 
contained 143 questions and was completed by 29,000 airmen. Excluding airmen on leave, 
the response to the survey was approximately 90%. 




The* quc^iOiMnain* include tli^ following hernia 



ivm Percent Comptou? 

Social Smirity Number 07.0 
Air Fofw Sp«ecialty Code 

fQue^Uoiu »14 and 15} 07.5 
Ethnic Qucrtion fQumion £^52) 00.7 



The wording of the ethnic question wdi« a% fotlowjs: 

'*\Vhich of the following do you consider your^lf?** 

A. Negro/Black 

B. Spanish or Mexican American 

C. American Indian 

D. Oriental 

E. White 

F. Other 

The individual's name, geographic area, age, and Armed Forces Qualification Test 
(AFQT) percentile were obtained from the Air Force Master File rather than from the 
survey data. Linkage to the master file required matches on tx>th the social security 
numl>er and the Air Force specially code« as well as a vaiki ethnic code in the survey 
data. There remained 22,103 cases for analysis. Eliminating the requirement for a match 
on the Air Force specialty code would have left 25,351 cases for analysis; however, the 
quality of the data would have been substantially lower. The items extracted from the 
master file were as follows: 



Percent Complete 

Item (for matching cases) 

Name 100.0 

Educational Level 00.0 

Home Stale 71.1 

Armed Forces Qualification Test 63.4 

Birth Date 00.4 



PROCEDURE 

The basic computerized technique for classifying names as Spanish or non-Spanish is 
to soit the names alphabetically and to compare the sorted cases against entries on a file 
of Spaniah surnames (which is also sorted alphabetically). If an individual's name appears 
in the Spanish-surname File, his name is classified as Spanisli. With this approach the 
names of the surveyed individuals were classified as Spanish or non-Spanish using each of 
the following lists: 

(1) Census surnames (8). 

(2) Morton surnames (0)-a list prepared by Dr. William E. Morton. 

(3) **Broad'* Spanish sumames-a list prepared by the author and Dr. Santiago 
Rodriguez' by adding to a census list names selected from a file of men 
separated from the Army. A preliminary selection was made by listing the 
names of persons who either lived in selected zipcode areas or who had 

' Dr. Rodriguez U on the staff of the Equal Opportunily Commiuion. 



Spanish first names. The final selection was made manually by Dr. 
Rodriguez. 

(4) ''Narrow^' Spanish surnames— a subset of the '*broad" surnames, developed 
chiefly by Dr. Rodriguez. Names which occur frequently in non-Spanish 
cultural groups were excluded. 

In addition, an ingenious technique for recognizing Spanish surnames has been 
developed by Dr. Robert Buechley (10,_ n). This technique is based on surname endings 
and letter combinations. This technique will be referred to as the: 

(5) Buechley technique. 

Two further procedures classify an individual as Spanish or non-Spanish based upon 
his first name. These do not require a separate sort of the file, since the list of first 
names is short enough to be stored in the computer memory and accessed randomly 
using a search procedure. This approach was used with the following two name lists: 

(6) "Broad'* Spanish first names^a list of male names developed from a file of 
Army separatees. The first names of individuals having Spanish surnames 
were collected. The resulting list «was screened by Dr. Rodriguez to 
eliminate the non-Spanish first names. 

(7) "Narrow" Spanish first names— a subset of the "Broad" Spanish first names 
developed by Dr. Rodriguez. "Broad" first names which occur frequently 
in non-Spanish cultures were eliminated. 

Finally, it is possible to classify an individual as Spanish or non-Spanish based upon 
different combinations of the above criteria. For example, we might require that an 
individual have both a narrow Spanish surname and a narrow Spanish first name before 
classifying the individual as Spanish. 

Given these classification schemes and the survey data, it is possible to compare the 
classification schemes with how the individuals classified themselves. 



RESULTS OF THE STUDY 

FALSE CLASSIFICATIONS 

A comparison of the different classification schemes is given in Table 1. To simplify 
presentation, it is assumed in the tables that an individual's classification of himself is 
correct.^ Those cases "falsely classified as Spanish" in Table 1 are individuals who 
completed something besides "Spanish or Mexican American" on the ethnic question but 
whose names were treated as Spanish by a given classification technique. Similarly, those 
cases "falsely classified as non-Spanish" had entries of "Spanish or Mexican American" 
on the questionnaire, but their names were not considered Spanish by another 
classification technique. 

INCLUSIVENESS VERSUS EXCLUSI VENESS 

In Table 1 it is possible to see obvious tradeoffs between including as many as 
possible who can reasonably be classified as Spanish and excluding all those who should 
not be classified as Spanish. For mpst statistical purposes, the latter is the more 
important criterion. It is possible to correct for undercounts, but there is no way of 
correcting a cross-tabulation biased by a substaiitial number of individual?* misclassified by 
cultural group. 

^ As we will see, the assumption is not always valid. 
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Table 1 

Number and Percent of Persons FalseJy Classified as 
Spanish or IMon-Spanish, by Classification Procedure 



Classification Prucedure 


M 1 1 mhpr 

l \ U i 1 lUCI 

Classified 
As Spanish 


Persons Falsely Classified 


As Spanish 


As Non-Spanish 


Total 


N 


%^ 


N 




N 


%^ 


1. 


"Broad" Spanish surname 


1,025 


420 


41.0 


98 


13.9 


518 


2.3 


2. 


"Narrow" Spanish surname 


814 


230 


28.3 


119 


16.9 


349 


1.6 


3. 


Census Spanish surname 


917 


350 


38.2 


136 


19.4 


486 


2.2 


4. 


Morton Spanish surname 


974 


391 


40.1 


120 


17.1 


511 


2.3 


5. 


Buechley technique 


1,163 


550 


47.3 


90 


12.8 


640 


2.9 


6. 


Any of the above 


1,436 


807 


56.2 


74 


10.5 


881 


4.0 


7. 


All the above 


733 


179 


24.4 


149 


21.2 


328 


1.5 


8. 


"Broad" Spanish first name 


732 


393 


53.7 


364 


51.8 


757 


3.4 


9. 


"Narrow" Spanish first name 


332 


78 


23.5 


449 


63.9 


527 


2.4 


10. 


Any of the above 


1,767 


1,119 


63.3 


55 


7.8 


1,174 


5.3 


11. 


All the above 


246 


29 


11.8 


486 


, 69.1 


515 


2.3 


12. 


"Narrow" surname OR ("broad" 


















first name and "broad" )surname 


822 


237 


28.8 


118 


16.8 


355 


1.6 


13. 


"Narrow" surname OR 


















("Narrow" first name) 


885 


275 


31.1 


93 


13.2 


368 


1.7 


14. 


"Narrow" surname OR ("narrow" 


















first name and "broad" surname) 


824 


232' 


28.2 


111 


15.8 


343 


1.5 


.15. 


"Narrow" surname ORr("r>3rrow" 


















first name and Morton surname) 


325 


232 


28.1 


110 


15.7 


342 


1.5 


16. 


"Narrow" surname OR ("narrow" 


















first name and Buechley surname) 


837 


241 


28.8 


107 


15.2 


348 


1.6 



Denominator used for these percentages was the number of persons classified as Spanish by the various coding techni(|ues. 
^^Denominator used for these percentages was the number of persons who classified themselves as Spariish. 703. 
^Denominator used for these percentages was the number of persons included in The survey, 22,1 93. 

There are, however, limits to how exclusively we can define the Spanish group. The 
requirement that an individual meet all the name criteria (Table 1, line 11) resulted in 
only 11.8% misclassified as Spanish. However, only 30.9% of those who considered 
themselves Spanish were included. It is doubtful that such a small group would be 
representative. A definition of "Spanish" that requires a Spanish first name is simply too 
restrictive in the United States. Even among persons having ''narrow" Spanish surnames, 
48.1% who classified themselves as Spanish did not have Spanish first names. 

Of the simple surnarie classification procedures (Table 1, lines 1-5), the "narrow" 
Spanish-surname test seems to be the best scheme for general statistical procedures. 
Fewer persons are misclassified as Spanish and fewer persons are misclassified overall than 
with the other surname procedures. The results are significant (p<.01). 

USE OF FIRST NAMES 



Attempts to improve the "narrow" surname procedure by additionally coding as 
Spanish those persons who meet a first name criterion (Table 1, lines 12-16) were not 
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particularly successful. In a few casesj the overall number of misclassifications was 
reduced; however the differences were too small to justify the additional computational 
effort and were, in any case, not significant. 

Table 1, line 6, suggests that the ''narrow" surname procedure could be improved by 
redefining **narrow" to exclude those surnames not treated as Spaiiish by the Morton, 
Census, or Buechley procedures. The difference in overall number of misclassifications 
was not significant (x^=l,3); however, the more exclusive procedure reached significance 
(p<.05) in testing for differences in the proportion of those classified as Spanish who 
were misclassified (x^ =5.3). 

GEOGRAPHIC DIFFERENCES 

When the data are broken out geographically (Table 2) the advantages of a '^narrow" 
surname classification procedure are still apparent. In general, however, all the name 
classification schemes do rather poorly outside the southwestern United States. This raises 
the question as to whether persons outside the Southwest who derive from a 
Spanish-speaking culture are more likely to have been assimilated into the dominant 
culture or whether such persons are less likely to think of themselves as Spanish 
regardless of their level of acculturation. 

There are not enough cases for further breakdowns within tlie geographic area. 



DIFFERENCES BY EDUCATIONAL LEVEL 

It is apparent that geography is not the only issue in determining ethnic 
classification. Table 3 shows that Spanish-named persons with more than a high school 
education were less likely to think of themselves as Spanish (p<.025). 



DIFFERENCES BY AFQT PERCENTILE 

Table 4 shows similar differences by percentile score on the Armed. Forces 
Qualification Test (AFQT). At higher AFQT percentiles, persons with Spanish names are 
less likely to classify themselves as Spanish (p<.01). The AFQT is primarily a general 
aptitude test, rather than an IQ test. It seems reasonable that persons more assimilated 
into the dominant culture would score higher on the AFQT and also be less likely to 
classify themselves as Spanish. The chi-square statistics are significant (/;<. 01). 

DIFFERENCES BY AGE 

Cross-tabulations by age (Table 5) show no clear trend. In the column for persons 
falsely classified as Spanish, most of the classification schemes show an apparent slight 
trend whereby younger persons with Spanish names are less apt to classify themselves as 
Spanish; however, the Buechley technique shows the oposite trend. Using chi-square tests, 
it appears that none of the relationships is significant. 



RATIOS 
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A related issue is whether the ratio between the numbers of persons classified as 
Spanish by two different techniques varies substantially for different population 
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jiubjproupA. If ihf raHa* do vnry subiUintially« then vato tmwt \h* exordml in comviuifi 
e«UmaU*s of the number of Spanish-named persons according lo one classification si-heme 
so that they are com|Ktrable with the number of Spanish-named fH^rsons bascul upon a 
second classification scheme. 

As may Ik? seen from Table 6. there are some differences. The ratio of "narrow" 
surnamed |>ersons to Census Spanish-surnamed individuals is particularly low outside New 
York, Xew Jersey. Florida, and the five southwestern states. The ratios of 
Spanish^^mamed persons by the BuecMey technique to Census Spanish sarnamed persons 
vary considerably. Outside the southwest the ratios are iKirticulorly high These ratios also 
depend upon AFQT |)ercentfle (/;<.01| and ag^* {p< )5l 

The ratios of persons with Morton surnames io |>ersons with Census surnames differ 
very little by population subgroup. This does not s«»m surprising when one considers that 
Morton used the 1960 Census surnames as a starting point for building his name list and 
that the Census Bureau subsequently reintroduced many of Morton's additions into the 
1970 Census list of Spanish surnames. Although the numl)er of names on Morton's list is 
still much larger than those on the 1970 Census list, Morton's additional names occur 
infrecfuently; thus the ratio of |>crsons with Morton surnames to Census surnames is only 
slightly larger than 1. 

The ratios of persons classified as Spanish to those who classify themselves as 
Spanish are shown in Table 7. The ratios de|)end u|>on geographic area (p<.Ol), 
educational level (p<.05). and AFQT level (/;<.01>. 



DISCUSSION 

CHOOSING A SUITABLE CLASSIFICATION TECHNIQUE 

Anyone who has built a list of Spanish surnames has probably faced the 
emlxirrossment of finding obvious Spanish surnames not on his list. Perhaps for this 
reason most classification schemes err on the side of being too inclusive. 

It seems clear from these data that for general statistical purposes the best 
computerised procedure for classifying names as Spanish or non-Spanish is a procedure^ 
based on a "narrow*' definition of Spanish. This leads to fewer overall misclassifications 
and* more im))ortantly, the Spanish group includes a smaller |>orlion of persons who are 
not actually Si^anish. 

Three caveats should be attached to this conclusion. First, it should be pointed out 
that computerized coding is not the only alternative. In theory, manual coding can have 
fewer mLsclossifications than the computerized techniques involved here, since additional 
information such i» accent marks or names of relatives can be utilized. A manual coder 
can also accept name variations (thr Buechtey technique can normally handle name 
variations, but the other surname techniques cannot). However, comparing the results for 
five southwestern states (Table 2) with Bucchley's California results (H), it appears that 
manual coding using the 1970 Census list is less accurate than computerized coding. The 
problem was not in falsely classifying non-Spanish as Spanish. The results in Buechley's 
study and in the present study were not significantly different in this res|)ect (Table 8). 

The manual techniques appear, however, to misclassify substantially more Spanish as 
non*Spanish« as shown in Table 9. Buechley notes that clerical coding errors of this type 
ore especially common with names that do not 'look'' very Spanish. 

The second caveat is that ''narrow'' surname classirication is best only at this point 
in time. It is quite possible that the Buechley technique may be improved so that the 
high proportion of those falsely classified as Spanish may be reduced.*^ The Buechley 

Q '^Buechley li. in fact» planninK a revlaed version of hit Spanikh-sumamc recognition program. 

ERIC 





n 





9i 




3 




a 




c 




£ 




s; 








> 
















3) 
3 




CD 




rname 




3 




CO 




c 




0 




§ 


a 


s 


c 
















a 




c 






03 


Co 


Census Surnami 




rname 




3 




0) 








o 








1 


set 




3 


(/) 


C 


o 


lat 


3 


a 




i 



^ en 

O) to 
^' ^ 00 



SO o> rg 
in (O ^ 
^' ^' 



— ^* oi — 



r> ^ 
o 04 in 



o 

5i > 
5 S 

Z 



_g <^ CD W 

11 Qi ^ 

9 o Is c 

S > w = 

(u $ » 22 

u. Z O ^ 



3 



CO 



CO O) 
(O in 

^ 



o m 00 m 

(£) (O (O (O 



CO 



CO 



(O o> m 

^ CO 00 CO 



CO 00 o> 

m ^ CM 



CO 



00 
CM 



CO 



O (O 00 O) 
V" CM CM 



m o) CO 

CO CO CM 



o 

CO 



CO CO co^ooo) in«->oo co 

— ^ O — CO— OfCMO 



0) 

> 2 

I ^ 



O 

04 c 
— O 

So 



C 

o 

i 



CO JS 

CO 2 <o 

V/ S A 

^ ^ ^ c 

*^ $ 

c c c 5 

0) 0) 0) c 

U U U 3 



0) 



£ £ £ 



2 o 



£ 



*s o o o o 

u. u. u. u. 
< < < < 



> ^ 0) 

m <7 > 

C4 m ^ 

V (A I. 



(0 

O 



12 



Table 8 



Number of Persons Falsely Classified^ as Spanish 
by Study and Coding Techniques 





Buechlcy's Study (Manual 


Present Study (Computerized 


Coding Technique 


Coding Using Census Surnames) 


Coding Using Census Surnames) 


Buechley technique 


46 


88 


Census surnames 


38*^ 


72 



= 0.0 (1 df). p < .01 



^In Baechlcy's study, a false classification was determined by inspection of the names classified as Spanish. 

*^Buechley gives this number as 40. since he believed that two of the names on the 1970 census list were not 
Spanish. In order to get a valid comparison of manual and computerized techniques, it is necessary to not count these 
as errors. 



Table 9 

Number of Persons Falsely Classified^ as 
Non-Spanish by Study and Coding Technique 





Bucchley's Study (Manual 


Preseni Study (Compuierized 


Coding Techrrique 


Coding Using Census Surnames) 


Coding Using Census Surnames) 


Buechley technique 


52 


34 


Census surnames 


223 


56 



- a}JUdf\: p < .01. 



^In Buechley's study a false classification was determined by inspection of the names classified as Spanish. 



technique already has two advantages in that it does not require an alphabetic sort of the 
surnames to be classified, and it has fewer misclassified as non-Spanish, 

The third caveat is that for some purposes it may be desirable to use Spanish names 
only as a means of restricting attention to a group who may be ''Spanish". The definitive 
assessment of ethnicity is determined by a follow-up of individuals whose names are 
treated as Spanish by the computerized coding technique. In this case, a more inclusive 
coding technique (e.g., the Buechley technique) has clear advantages. 



DEFINITIVE LIST OF SPANISH SURNAMES 

It should be mentioned that the list of ''narrow" Spanish surnames used here or any 
known list cannot be considered definitive. There probably are names not on the list 
which should be, and vice versa. 

Interestingly, there is a simple and completely automated procedure for building a 
definitive list. Unfortunately the procedure requires a very large magnetic tape file of the 
names of persons living in the United States. The definitive list could be constructed 
simply by accepting only those surnames possessed by persons who in a high percentage 
O : of cases have Spanish first names. 
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USE OF SPANISH SURNAMES OUTSIDE THE SOUTHWEST 



Outside the Southwest, the proportion of Spaiiisli-surnamcd persons in the study 
who did not classify themselves as ''Spanish or Mexican Ameriean" was so lar^e that one 
must ask whether Spanish-surname classification in those areas has any merit at all. If, for 
example, a study were conducted to determine the income levels of Spanish-surnamed 
college graduates in Minneapolis, probably' only a small percent of the study group would 
be culturally Spanish. 

Whether the situation is as serious as the figures in Table 2 suggest is not clear. It 
would seem that the Air Force sample represents a more assimilated group than the 
population of Spanish*surnamed persons living in the United States. Also, there are 
culturally Spanish persons, particularly Puerto Ricans, who would not want to classify 
themselves as ''Spanish or Mexican Americans." Nevertheless, the apparent numl>er of 
misclassifications is so large that one must proceed with caution, at least until further 
studies can examine the backgrounds of Spanish-surnamed (persons living outside the 
Southwest. 

The Bureau of the Census, incidentally, has long contended that Spanish-surname 
classification would not hold up outside the Southwest. This situation may change, 
however, as more Hisparios inhabit those areas. Even now there are undoubtedly local 
areas outside the Southwest where the correspondence between Spanish surname and 
Spanish culture is strong. 

Also it should be mentioned that there are study designs where the poor specificity 
of Spanish surname classification can be tolerated. For example, if in certain areas one 
finds employers of blue collar workers who have no persons of Spanish surname on their 
payrolls, there would be good evidence of discriminatory employment practices. The poor 
specificity of Spanish-surname classification, in such a case, becomes a problem in the 
opposite direction. It is possible to have discriminatory employment practices, and still 
employ a substantial number of persons with Spanish surnames. 

EFFECT OF BIAS AND OTHER PROBLEMS IN THE DATA 

The problems T ethnic classification using Spanish surnames are serious enough that 
it may well be askec whether some idiosyncrasies in our data or its treatment might have 
magnified the proble.'ns. 

The most obvinus bias in the data occurs because Air Force enlisted men are not an 
unbiased sample of the U.S. population. Further bias arises from non-response to the 
survey and from the requirement to match the master file on social security number and 
Air Force Specialty Code. 

The bias caused by requiring a match on the Air Force Specialty needs no 
speculation. The results with and without the Air Force Specialty Code match are shown 
in Table 10. This match elimmated persons who were not conscientiously completing 
their forms and possibly a small set of miscoded social security numbers which found 
matching cases in the master file. Without the match, the apparent problems in the use of 
Spanish-surname classifications would increase. 

The effect of most other forms of bias would cause the Spanish-named persons 
among the survey respondents to represent a more assimilated group than people in the 
general population. The only effect of the bias is to restrict the range of assimilation in 
the survey data. This could have the effect of increasing the proportion of persons 
misclassified as Spanish in the study, but it should not create the differences observed 
between population subgroups. 
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Table 10 



Comparison of Misclassification by Coding Technique With and 
Without Matching on the Air Force Specialty Code (AFSC) 



Coding Technique 


Percer. 'sely Classified as 
S vanish^ 


Percent Falsely Classified as 
Non-Spanish^ 


With AFSC 
Malch 


Without AFSC 
Match 


With AFSC 
fVlatch 


Without AFSC 
Match 


"Narrow" surname 


28.3 


30.3 


16.9 


38.8 


Census surname 


38.2 


40.1 


19.4 


41.1 


Broad surname 


40.1 


41,8 


17.1 


39.2 


Buechley technique 


47.3 


48.7 


12.8 


35.9 



^The denominators used for these percentages were the numbers of persons classified as Spanish by each coding 
technique. 

The denominators used for these percer»tages were the numbers of persons who classified themselves as Spanish, 



The most tickli.sh problem in the data occurs because of the late appearance of the 
ethnic question in the survey— the 52nd question in a survey of 143 questions. It is 
possible that by this stage a sizeable portion of persons were not conscientiously 
completing the questionnaire. 

Nonconscientious marking would in effect preate noise in the data. This noise should 
not create the geographic differences in the proportion of Spanish-surnamed persons who 
classified themselves as Spanish; however, it could have a substantial effect on the 
proportion of those marking "Spanish or Mexican American" on the questionnaire who 
did not have Spanish names. The difference is that the Spanish-surnamed population does 
not depend on the survey results for its definition; however, the population of those 
indicating Spanish on the survey does dei^end on survey results. — 

The typo of effects that rote marking might have on the results may best be seen 
from a separate Air Force survey. In the airman survey of July 1971 the same ethnic 
que.stion was asked as the 105th of 150 questions,. a placement much later than the 52nd 
of 113 questions in the March survey. A comparison of the results of the two surveys is 
shown in Table 11. While the percent falsely classified as Spanish is approximately the 
same in both surveys, the percent falsely classified as non-Spajiish differs substantially. 

To provide a more realistic estimate of the persons misclassified as non-Spanish, it is 
necessm^ to correct the tabulations in some way. This was done by assuming that among 
(the set of persons identifying themselves as Spanish) and S2 (the subset of S-^ having 
"narrow" Spanish surnames), Pf (the proportion of persons having *'naiTow*' Spanish first 
names) should be the same- Any <feficit of Pf^ in under Pf2 in S2 would be attributed 
to carole.ss marking. Tiie number N^^ classifying themselves as Spanish through carelessness 
may then be estimated by: 

Pfl 

wilore N^^ is the number of cases in S^. By subtracting from both numerator and 
denominator, adjusted .estimates may be calculated for the percent of persons falsely 
classified as non-Spanish. The same procedure may be followed within each geographic 
area, AFQT group, and educational level. The results are shown in Tables 12 and 13. 



ERIC 



15 



Table 11 



Percent of Persons Falsely Classified as Spanish and Non-Spanish 
by Poputation Subset and Survey 



Population Subset 


Percent Falsely Classified 
as Spanish^ 


Percent Falsely Classified 
as Non-Spanish'^ 


March 


July 


March 


July 


All areas 


28.3 


28.0 


16.9 


27.3 


Five southwestern states 


14.5 


16.1 


11.9 


13.2 


New York, New Jersey, Florida 


37.1 


36.4 


24.1 


33.0 


Other areas 


55.0 


61.1 


30.8 


59.3 


State unknown 


35.3 


32.6 


18.5 


37.9 


AFQT-33 


19.7 


22.6 


17.5 


25.3 


AFQT 34-67 


28.3 


28.5 


20.4 


32.0 


AFQT-67 


37.9 


32.8. 


14.4 


31.2 


AFQT unknown 


28.4 


' 28.8 


15,1 


21.7 


Years of school < 12 


26,7 


26.8 


17,1 


27.2 


Years of school > 12 


42.9 


39.2 


15,4 


28.4 



The denominators used for these percentages were the numbers of persons in the population subsets who had a 
"narrow" Spanish surname. 

The denominators used for tfiese percentages were the numbers of persons h\ the population subsets who 
classified themselves as Spanish on the survey. 



Table 12 shows that the adjustment procedure does a crediblo jol) of rxplaininjf 
differences between the unadjusted results of the March and July surveys. 

Table 13 shows that while the Buechley technique is still tho most inclusive of i\u* 
Spanish-surname classification procedures, it nevertheless misses aintost 87n of tlio.se 
persons clasrafying themselves as Spanish. The 8% estimate is, if anything, low, since it 
assumes that those who do not have Spanish surnames are as apt to have Spanish first 
names as those who do have Spanish surnames. The assumption may not be entirely trui: 

Tables 12 and 13 also show that the proportion misclassified as non-Spanish depends 
upon the geographic area (p<.005) but does not depend on either the AFQT or 
educational levels (jd>.05}. One must, of course, view these results cautiously becau.se of 
the indirect procedure used in creating Tables 12 and 13. 



SUMMARY AND CONCLUSfONS 

Several computerized procedures for classifying names as Spanish or non-Spanish 
were compared, using Air Force survey data. The results of each classification procedure 
were compared with the classifications selected by respondents to the survey, The 
conclusions were as follows: 

(1) Outside five southwestern states, Spanish name classifications included 
enough persons who did not consider themselves Spanish that the u.sefuhiess of the 
technique for these areas is seriously reduced. 

(2) At higher educational levels and AFQT percentiles, ail the surname 
classification procedures included increasing proportions of i>ersons wlio did not consider 

O themselves Spanish. 
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(3) Even for the tnoU inclusive surname classificaiion technique, the portion of 
Spani)ih persons who are misied is estimated to he Sl'Ji or higher. 

(*>) There is some evidence that more persons selNdentified as Spanish are 
missed by the surname clussificalion procedures outside the Southwest. 

(5) The lM!st classification procedure for general statistical purposes, tlie 
^'narrow** surname technique, required a more exclusive list of Spanish surnames titan has 
generally lieen used. This procedure had fewer overall mlsclasstfications and Uie resulting 
Spanish f^oup contained fewer persons who did not consider themselves Spanish. 

(6) Future research efforts are outlined to: 

(a) Produce a more definitive list of Spanish surnames. 

(b) Explore improvements in Buechley's technique of classifying 
Spanish names. 

(c) Further examine persons of Spanish surname and culture who do not 
classify themselves as "Spanish or Mexican Americans.** 
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This Study was performed to show the validity., or lack of it, of various 
coding techniques used to identify persons of Spanish derivation. The 
results of computerized methods to identify Spanish names are compared 
with responses to questionnaires, in which people identified themselves 
as Spanish. Outside of five southwestern states and at higher educational 
and aptitude levels, the name recognition procedures include increasing 
proportions of persons who do not classify themselves as Spanish. This 
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problem is mitigated by using a more restrictive list of Spanish surnames 
than has been used previously. 
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